Agenda

  • Questions arising from lessons, tasks
  • Slide presentations
  • Checking your work
  • Dynamic graphics

The plan for the live coding is for us to work together to solve problems using tools from this week’s lessons. The skills from this week are about making slide presentations, checking data and calculations, and producing dyanmic graphics.

Today I will work through examples

  • creating a slide presentation using “Rpres” format with Rstudio, previewing the presentation, saving the presentation as a web page and viewing the presentation in your web browser,
  • some tests on data, especially summarizing categorical variables, and counting missing data,
  • making and using a plotly interactive graph,
  • making and displaying an animated graph using gganimate.

Slide presentations with R

This will be done in a separate document. Watch the recording of the session (or the live session) for more information.

When you save your R presentation, you should see a preview appear. The preview window has a pop-up menu with a “save as webpage” option. Use this to create a file that can be viewed as a slideshow from a web browser.

Testing data

Let’s use the penguins and penguins_raw data.

First get an overview of the data. Use functions from the dlookr package: describe, diagnose, diagnose_category. This is a carefully cleaned data set, so there are probably no obvious problems with it.

describe(penguins_raw)
## # A tibble: 7 x 26
##   variable     n    na    mean      sd se_mean     IQR skewness kurtosis     p00
##   <chr>    <int> <int>   <dbl>   <dbl>   <dbl>   <dbl>    <dbl>    <dbl>   <dbl>
## 1 Sample …   344     0   63.2   40.4    2.18   6.62e+1   0.351    -0.926    1   
## 2 Culmen …   342     2   43.9    5.46   0.295  9.27e+0   0.0531   -0.876   32.1 
## 3 Culmen …   342     2   17.2    1.97   0.107  3.10e+0  -0.143    -0.907   13.1 
## 4 Flipper…   342     2  201.    14.1    0.760  2.30e+1   0.346    -0.984  172   
## 5 Body Ma…   342     2 4202.   802.    43.4    1.20e+3   0.470    -0.719 2700   
## 6 Delta 1…   330    14    8.73   0.552  0.0304 8.72e-1   0.239    -0.748    7.63
## 7 Delta 1…   331    13  -25.7    0.794  0.0436 1.26e+0   0.338    -1.03   -27.0 
## # … with 16 more variables: p01 <dbl>, p05 <dbl>, p10 <dbl>, p20 <dbl>,
## #   p25 <dbl>, p30 <dbl>, p40 <dbl>, p50 <dbl>, p60 <dbl>, p70 <dbl>,
## #   p75 <dbl>, p80 <dbl>, p90 <dbl>, p95 <dbl>, p99 <dbl>, p100 <dbl>
diagnose(penguins_raw)
## # A tibble: 17 x 6
##    variables      types   missing_count missing_percent unique_count unique_rate
##    <chr>          <chr>           <int>           <dbl>        <int>       <dbl>
##  1 studyName      charac…             0           0                3     0.00872
##  2 Sample Number  numeric             0           0              152     0.442  
##  3 Species        charac…             0           0                3     0.00872
##  4 Region         charac…             0           0                1     0.00291
##  5 Island         charac…             0           0                3     0.00872
##  6 Stage          charac…             0           0                1     0.00291
##  7 Individual ID  charac…             0           0              190     0.552  
##  8 Clutch Comple… charac…             0           0                2     0.00581
##  9 Date Egg       Date                0           0               50     0.145  
## 10 Culmen Length… numeric             2           0.581          165     0.480  
## 11 Culmen Depth … numeric             2           0.581           81     0.235  
## 12 Flipper Lengt… numeric             2           0.581           56     0.163  
## 13 Body Mass (g)  numeric             2           0.581           95     0.276  
## 14 Sex            charac…            11           3.20             3     0.00872
## 15 Delta 15 N (o… numeric            14           4.07           331     0.962  
## 16 Delta 13 C (o… numeric            13           3.78           332     0.965  
## 17 Comments       charac…           290          84.3             11     0.0320
diagnose_category(penguins)
## # A tibble: 9 x 6
##   variables levels        N  freq ratio  rank
##   <chr>     <fct>     <int> <int> <dbl> <int>
## 1 species   Adelie      344   152 44.2      1
## 2 species   Gentoo      344   124 36.0      2
## 3 species   Chinstrap   344    68 19.8      3
## 4 island    Biscoe      344   168 48.8      1
## 5 island    Dream       344   124 36.0      2
## 6 island    Torgersen   344    52 15.1      3
## 7 sex       male        344   168 48.8      1
## 8 sex       female      344   165 48.0      2
## 9 sex       <NA>        344    11  3.20     3
diagnose_category(diamonds)
## # A tibble: 20 x 6
##    variables levels        N  freq ratio  rank
##    <chr>     <ord>     <int> <int> <dbl> <int>
##  1 cut       Ideal     53940 21551 40.0      1
##  2 cut       Premium   53940 13791 25.6      2
##  3 cut       Very Good 53940 12082 22.4      3
##  4 cut       Good      53940  4906  9.10     4
##  5 cut       Fair      53940  1610  2.98     5
##  6 color     G         53940 11292 20.9      1
##  7 color     E         53940  9797 18.2      2
##  8 color     F         53940  9542 17.7      3
##  9 color     H         53940  8304 15.4      4
## 10 color     D         53940  6775 12.6      5
## 11 color     I         53940  5422 10.1      6
## 12 color     J         53940  2808  5.21     7
## 13 clarity   SI1       53940 13065 24.2      1
## 14 clarity   VS2       53940 12258 22.7      2
## 15 clarity   SI2       53940  9194 17.0      3
## 16 clarity   VS1       53940  8171 15.1      4
## 17 clarity   VVS2      53940  5066  9.39     5
## 18 clarity   VVS1      53940  3655  6.78     6
## 19 clarity   IF        53940  1790  3.32     7
## 20 clarity   I1        53940   741  1.37     8

Computing summary statistics (mean, median, etc) with variables that contain missing data.

penguins_raw %>% group_by(Species) %>%
  summarize(mean_flipper_length = mean(`Flipper Length (mm)`, na.rm=TRUE),
            n_missing = skimr::n_missing(`Flipper Length (mm)`),
            n_complete = skimr::n_complete(`Flipper Length (mm)`),
            n =n(),
            my_n_missing = sum(is.na(`Flipper Length (mm)`)),
            m_n_complete = sum(!is.na(`Flipper Length (mm)`)))
## # A tibble: 3 x 7
##   Species  mean_flipper_le… n_missing n_complete     n my_n_missing m_n_complete
## * <chr>               <dbl>     <int>      <int> <int>        <int>        <int>
## 1 Adelie …             190.         1        151   152            1          151
## 2 Chinstr…             196.         0         68    68            0           68
## 3 Gentoo …             217.         1        123   124            1          123
penguins_raw %>% filter(!is.na(Sex)) %>%
  group_by(Species, Sex) %>%
  summarize(mean_flipper_length = mean(`Flipper Length (mm)`, na.rm=TRUE),
            n_missing = skimr::n_missing(`Flipper Length (mm)`),
            n_complete = skimr::n_complete(`Flipper Length (mm)`),
            n =n(),
            my_n_missing = sum(is.na(`Flipper Length (mm)`)),
            m_n_complete = sum(!is.na(`Flipper Length (mm)`)))
## `summarise()` has grouped output by 'Species'. You can override using the `.groups` argument.
## # A tibble: 6 x 8
## # Groups:   Species [3]
##   Species         Sex   mean_flipper_le… n_missing n_complete     n my_n_missing
##   <chr>           <chr>            <dbl>     <int>      <int> <int>        <int>
## 1 Adelie Penguin… FEMA…             188.         0         73    73            0
## 2 Adelie Penguin… MALE              192.         0         73    73            0
## 3 Chinstrap peng… FEMA…             192.         0         34    34            0
## 4 Chinstrap peng… MALE              200.         0         34    34            0
## 5 Gentoo penguin… FEMA…             213.         0         58    58            0
## 6 Gentoo penguin… MALE              222.         0         61    61            0
## # … with 1 more variable: m_n_complete <int>

Interactive plots with plotly

Use plotly to make a scatterplot. This creates an interactive HTML “widget” that lets the user view data about the plot, zoom and pan the plot.

plot_ly(penguins, x = ~ body_mass_g, y = ~ flipper_length_mm, color = ~ species)
## No trace type specified:
##   Based on info supplied, a 'scatter' trace seems appropriate.
##   Read more about this trace type -> https://plotly.com/r/reference/#scatter
## No scatter mode specifed:
##   Setting the mode to markers
##   Read more about this attribute -> https://plotly.com/r/reference/#scatter-mode
## Warning: Ignoring 2 observations

Animated plot

The gganimate package allows you to convert a regular ggplot into a series of frames which are the animated. There are several functions that can be used to move (transition) from one frame to the next:

  • transition_states which uses a variable to define a partitioning of the data; a bit like a temporal version of “colour”
  • transition_time splits the data by a quantitative variable and uses the value of the variable to time the movement through the data
  • transition_events which requires start and end times for each frame

We’ll use the penguin or gapminder data to generate some animations. The examples in the transition_* help pages have some great examples.

gapminder %>% ggplot(aes(year, lifeExp)) + geom_line(aes(group=country)) +
  transition_manual(continent) 
## nframes and fps adjusted to match transition

Many lines and then a line with changing slope:

df <- tibble(intercept = 0:10,
             slope = (-5:5)/5,
             group = 0:10)
df %>% ggplot() +
  geom_abline(aes(intercept = intercept, slope = slope)) +
  xlim(-10, 10) + 
  ylim(-2, 12)

Now animate

df %>% ggplot() +
  geom_abline(aes(intercept = intercept, slope = slope)) +
  xlim(-10, 10) + 
  ylim(-2, 12) + 
  transition_time(group)

df %>% ggplot() +
  geom_abline(aes(intercept = intercept, slope = slope)) +
  xlim(-10, 10) + 
  ylim(-2, 12) + 
  transition_states(group)

df %>% ggplot() +
  geom_abline(aes(intercept = intercept, slope = slope)) +
  xlim(-10, 10) + 
  ylim(-2, 12) + 
  transition_manual(group) 
## nframes and fps adjusted to match transition

Try this for penguins

penguins %>% ggplot(aes(body_mass_g, flipper_length_mm, color = sex)) + geom_point() +
  transition_states(species) # transition_manual(species)
## Warning: Removed 1 rows containing missing values (geom_point).

## Warning: Removed 1 rows containing missing values (geom_point).

## Warning: Removed 1 rows containing missing values (geom_point).

## Warning: Removed 1 rows containing missing values (geom_point).

## Warning: Removed 1 rows containing missing values (geom_point).

## Warning: Removed 1 rows containing missing values (geom_point).

## Warning: Removed 1 rows containing missing values (geom_point).

## Warning: Removed 1 rows containing missing values (geom_point).

## Warning: Removed 1 rows containing missing values (geom_point).

## Warning: Removed 1 rows containing missing values (geom_point).

## Warning: Removed 1 rows containing missing values (geom_point).

## Warning: Removed 1 rows containing missing values (geom_point).

## Warning: Removed 1 rows containing missing values (geom_point).

## Warning: Removed 1 rows containing missing values (geom_point).

## Warning: Removed 1 rows containing missing values (geom_point).

## Warning: Removed 1 rows containing missing values (geom_point).

## Warning: Removed 1 rows containing missing values (geom_point).

## Warning: Removed 1 rows containing missing values (geom_point).

## Warning: Removed 1 rows containing missing values (geom_point).

## Warning: Removed 1 rows containing missing values (geom_point).

## Warning: Removed 1 rows containing missing values (geom_point).

## Warning: Removed 1 rows containing missing values (geom_point).

## Warning: Removed 1 rows containing missing values (geom_point).

## Warning: Removed 1 rows containing missing values (geom_point).

## Warning: Removed 1 rows containing missing values (geom_point).

## Warning: Removed 1 rows containing missing values (geom_point).

## Warning: Removed 1 rows containing missing values (geom_point).

## Warning: Removed 1 rows containing missing values (geom_point).

## Warning: Removed 1 rows containing missing values (geom_point).

## Warning: Removed 1 rows containing missing values (geom_point).

## Warning: Removed 1 rows containing missing values (geom_point).

## Warning: Removed 1 rows containing missing values (geom_point).

## Warning: Removed 1 rows containing missing values (geom_point).

## Warning: Removed 1 rows containing missing values (geom_point).

## Warning: Removed 1 rows containing missing values (geom_point).

## Warning: Removed 1 rows containing missing values (geom_point).

## Warning: Removed 1 rows containing missing values (geom_point).

## Warning: Removed 1 rows containing missing values (geom_point).

## Warning: Removed 1 rows containing missing values (geom_point).

## Warning: Removed 1 rows containing missing values (geom_point).

## Warning: Removed 1 rows containing missing values (geom_point).

## Warning: Removed 1 rows containing missing values (geom_point).

## Warning: Removed 1 rows containing missing values (geom_point).

## Warning: Removed 1 rows containing missing values (geom_point).

## Warning: Removed 1 rows containing missing values (geom_point).

## Warning: Removed 1 rows containing missing values (geom_point).

## Warning: Removed 1 rows containing missing values (geom_point).

## Warning: Removed 1 rows containing missing values (geom_point).

## Warning: Removed 1 rows containing missing values (geom_point).

## Warning: Removed 1 rows containing missing values (geom_point).

## Warning: Removed 1 rows containing missing values (geom_point).

## Warning: Removed 1 rows containing missing values (geom_point).
## Warning: Removed 2 rows containing missing values (geom_point).

## Warning: Removed 2 rows containing missing values (geom_point).

## Warning: Removed 2 rows containing missing values (geom_point).

## Warning: Removed 2 rows containing missing values (geom_point).

## Warning: Removed 2 rows containing missing values (geom_point).

## Warning: Removed 2 rows containing missing values (geom_point).

## Warning: Removed 2 rows containing missing values (geom_point).

## Warning: Removed 2 rows containing missing values (geom_point).

## Warning: Removed 2 rows containing missing values (geom_point).

## Warning: Removed 2 rows containing missing values (geom_point).

## Warning: Removed 2 rows containing missing values (geom_point).

## Warning: Removed 2 rows containing missing values (geom_point).

## Warning: Removed 2 rows containing missing values (geom_point).

## Warning: Removed 2 rows containing missing values (geom_point).

## Warning: Removed 2 rows containing missing values (geom_point).